In this tutorial, we will learn how to set up a R Markdown document for reproducible statistical analysis/modeling, primarily using functions and syntax from the KnitR package. Please ensure you have the Chapter 9 folder downloaded, as the way this .rmd runs depends on the current file structure.
In this tutorial, you will learn methods to ensure your analysis is reproducible from the moment you start coding to the moment it is published. You will learn how to properly set up pipelines to rerun analysis and present results every time you knit, different methods to dynamically connect data gathering and source code to your output documents, and how to keep markdown documents up to date without overloading your computer when your analysis is computationally intensive.
Coding is hard. It takes a long time, and you’ll likely have to go back to the same analysis multiple times to tweak and change things. It is important to setup a good workflow early to ensure you’re not scrambling to make your research reproducible at the end of the process. You’ll be saving yourself a lot of time if your analysis is reproducible and your results presented the way you want them to be every time you knit your document.
Through this tutorial, students will learn to:
Make sure you have the following packages:“knitr”, “rmarkdown”, “bookdown”, “formattable”, “kableExtra”, “dplyr”, “magrittr”, “prettydoc”, “htmltools”, “knitcitations”, “bibtex”, “devtools”, “tidyverse”, “ebirdst”, “sf”, “raster”, “terra”, “patchwork”, “tigris”, “rnaturalearth”, “ggplot2”,” rnaturalearthhires”. Below are the code chunks for loading packages and setting up the document from Ch. 2:
###~~~
# Load R packages
###~~~
#Create a vector w/ the required R packages
# --> If you have a new dependency, don't forget to add it in this vector
pkg <- c("knitr", "rmarkdown", "bookdown", "formattable", "kableExtra", "dplyr", "magrittr", "prettydoc", "htmltools", "knitcitations", "bibtex", "devtools", "tidyverse", "ebirdst", "sf", "raster", "terra", "patchwork", "tigris", "rnaturalearth", "ggplot2", "rnaturalearthhires")
##~~~
#2. Check if pkg are already installed on your computer
##~~~
print("Check if packages are installed")
#This line below outputs a list of packages that are not installed. Which ones of the packages above are installed in computer, print packages that are not installed. Should print 0.
new.pkg <- pkg[!(pkg %in% installed.packages())]
##~~~
#3. Install missing packages
##~~~
# Use an if/else statement to check whether packages have to be installed
# WARNING: If your target R package is not deposited on CRAN then you need to adjust code/function
if(length(new.pkg) > 0){
print(paste("Install missing package(s):", new.pkg, sep=' '))
install.packages(new.pkg, dependencies = TRUE)
}else{
print("All packages are already installed!")
}
##~~~
#4. Load all required packages
##~~~
print("Load packages and return status")
#Here we use the sapply() function to require all the packages
# To know more about the function type ?sapply() in R console
# Just an easier way to load all packages
sapply(pkg, require, character.only = TRUE)
# debug = right-pointing arrow next to code will load in console so you can go line by line and debug. pressing this also seemed to resolve the CRAN needs a mirror error
# Generate BibTex citation file for all loaded R packages
# used to produce report Notice the syntax used here to
# call the function
knitr::write_bib(.packages(), file = "packages.bib")
# The .packages() function returns invisibly the names of all packages loaded in the current R session (to see the output, use .packages(all.available = TRUE)). This ensures that all packages used in your code will have their citation entries written to the .bib file. This packages is then used in the Appendix to cite the code used
Setup Chunk Options:
### Chunk options: see http://yihui.name/knitr/options/ ###
### Text output
opts_chunk$set(echo = TRUE, warning = TRUE, message = TRUE, include = TRUE)
## Code formatting
opts_chunk$set(tidy = TRUE, tidy.opts = list(blank = FALSE, width.cutoff = 60),
highlight = TRUE)
## Code caching
opts_chunk$set(cache = 2, cache.path = "cache/")
## Plot output The first dev is the master for the output
## document
opts_chunk$set(fig.path = "Figures_MS/", dev = c("png", "pdf"),
dpi = 300)
## Figure positioning
opts_chunk$set(fig.pos = "H")
The data for this tutorial should be in the Ch. 9 folder you downloaded. For this chapter, we will be working with a couple of datasets. Mike is working on consolidating some different exploratory analyses he did at different times over the last year. He wants to get them in the same place and make sure that they are reproducible. One set of data comes from the supplementary material of a paper (Schuetz and Johnston, 2019). It is a csv file that was pulled from one of the authors of the paper’s github. The other dataset came from the r package ebirdst, which interfaces with data from the community science platform eBird. We’ll start with a dataset of “encounter rates” calculated from eBird data, which are calculated as the percentage of checklists that include a particular species in a given area. This data has been analyzed for the United States by state, and Mike is getting ready to calculate encounter rates for species in Sonora, Mexico that are also found in Idaho. Before digging into the eBird data for Sonora, Mike wanted to get to know the dataset for Idaho so that he knows what species to look for in the Sonora data. Let’s get that data ready.
First, please set your working directory to the Chapter 9 folder that you loaded on your computer. You can do this by selecting the working directory through the session tab menu, then choosing to source file location, or using the command setwd(*your file path here*). Now we are going to to do a little bit more data transformation to make a couple of objects of the Idaho encounter rate data grouped taxonomically for some descriptive statistics we will use later.
library("tidyverse")
# load csv file of data
state_enc_rate <- read.csv("01_raw_data/state.level.enc.rate.csv")
summary(state_enc_rate)
## common.name scientific.name state query.volume.state
## Length:31722 Length:31722 Length:31722 Min. : 0.00
## Class :character Class :character Class :character 1st Qu.: 0.00
## Mode :character Mode :character Mode :character Median : 0.00
## Mean : 10.35
## 3rd Qu.: 8.00
## Max. :100.00
## encounter.state.normalized
## Min. : 0.0000
## 1st Qu.: 0.0000
## Median : 0.3636
## Mean : 15.0947
## 3rd Qu.: 21.4322
## Max. :100.0000
head(state_enc_rate)
## common.name scientific.name state query.volume.state
## 1 Abert's towhee Melozone aberti Alabama 0
## 2 Abert's towhee Melozone aberti Alaska 0
## 3 Abert's towhee Melozone aberti Arizona 100
## 4 Abert's towhee Melozone aberti Arkansas 0
## 5 Abert's towhee Melozone aberti California 0
## 6 Abert's towhee Melozone aberti Colorado 0
## encounter.state.normalized
## 1 0.000000
## 2 0.000000
## 3 100.000000
## 4 0.000000
## 5 2.357214
## 6 0.000000
There is more data in this dataframe than we need, so lets do some transformation to get the data where we want it. Specifically, we are going to drop a column we don’t need, filter the data down to just Idaho, and filter for species with encounter rates higher than 50%.
# Get rid of uneccessary columns
state_enc_rate_clean <- state_enc_rate %>%
dplyr::select(-query.volume.state #not needed for analysis
)
view(state_enc_rate_clean)
# filter to just Idaho
state_enc_rate_clean_filtered <- state_enc_rate_clean %>%
filter(state == "Idaho", )
view(state_enc_rate_clean_filtered)
# get rid of species with no encounter rate
ID_enc_rate <- state_enc_rate_clean_filtered %>%
filter(encounter.state.normalized != 0)
# get ride of encounter rates below 50 to focus on commonly
# encou
ID_enc_rate <- ID_enc_rate %>%
filter(encounter.state.normalized > 50)
# check size of final dataframe
summary(ID_enc_rate)
## common.name scientific.name state
## Length:63 Length:63 Length:63
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## encounter.state.normalized
## Min. : 50.01
## 1st Qu.: 58.34
## Median : 69.38
## Mean : 71.40
## 3rd Qu.: 83.32
## Max. :100.00
# make a table for markdown output
kable(ID_enc_rate)
| common.name | scientific.name | state | encounter.state.normalized |
|---|---|---|---|
| American coot | Fulica americana | Idaho | 55.75646 |
| American kestrel | Falco sparverius | Idaho | 100.00000 |
| American robin | Turdus migratorius | Idaho | 72.07650 |
| American wigeon | Mareca americana | Idaho | 60.66470 |
| bank swallow | Riparia riparia | Idaho | 63.75313 |
| belted kingfisher | Megaceryle alcyon | Idaho | 50.01036 |
| black-billed magpie | Pica hudsonia | Idaho | 83.85025 |
| black-headed grosbeak | Pheucticus melanocephalus | Idaho | 57.82899 |
| Brewer’s blackbird | Euphagus cyanocephalus | Idaho | 66.23826 |
| Bullock’s oriole | Icterus bullockii | Idaho | 72.90967 |
| California quail | Callipepla californica | Idaho | 100.00000 |
| calliope hummingbird | Selasphorus calliope | Idaho | 92.01797 |
| Canada goose | Branta canadensis | Idaho | 69.74989 |
| Cassin’s finch | Haemorhous cassinii | Idaho | 78.77421 |
| Cassin’s vireo | Vireo cassinii | Idaho | 100.00000 |
| cedar waxwing | Bombycilla cedrorum | Idaho | 64.66979 |
| chukar | Alectoris chukar | Idaho | 52.27404 |
| cinnamon teal | Spatula cyanoptera | Idaho | 50.02885 |
| cliff swallow | Petrochelidon pyrrhonota | Idaho | 62.71355 |
| common goldeneye | Bucephala clangula | Idaho | 73.44130 |
| common merganser | Mergus merganser | Idaho | 77.87708 |
| common raven | Corvus corax | Idaho | 56.20139 |
| dark-eyed junco | Junco hyemalis | Idaho | 70.30395 |
| dusky flycatcher | Empidonax oberholseri | Idaho | 88.66510 |
| Eurasian collared-dove | Streptopelia decaocto | Idaho | 68.98302 |
| European starling | Sturnus vulgaris | Idaho | 60.81733 |
| evening grosbeak | Coccothraustes vespertinus | Idaho | 53.82014 |
| great horned owl | Bubo virginianus | Idaho | 99.39787 |
| Hammond’s flycatcher | Empidonax hammondii | Idaho | 100.00000 |
| house finch | Haemorhous mexicanus | Idaho | 60.00921 |
| killdeer | Charadrius vociferus | Idaho | 56.41970 |
| lazuli bunting | Passerina amoena | Idaho | 85.12329 |
| Lewis’s woodpecker | Melanerpes lewis | Idaho | 71.66795 |
| long-eared owl | Asio otus | Idaho | 66.75941 |
| MacGillivray’s warbler | Geothlypis tolmiei | Idaho | 76.69511 |
| mallard | Anas platyrhynchos | Idaho | 87.65577 |
| mountain chickadee | Poecile gambeli | Idaho | 63.37515 |
| northern flicker | Colaptes auratus | Idaho | 86.56977 |
| northern harrier | Circus cyaneus | Idaho | 71.13981 |
| northern pygmy-owl | Glaucidium gnoma | Idaho | 69.37824 |
| northern rough-winged swallow | Stelgidopteryx serripennis | Idaho | 58.52182 |
| northern saw-whet owl | Aegolius acadicus | Idaho | 73.17100 |
| olive-sided flycatcher | Contopus cooperi | Idaho | 55.03477 |
| pine siskin | Spinus pinus | Idaho | 78.22397 |
| prairie falcon | Falco mexicanus | Idaho | 73.68943 |
| red crossbill | Loxia curvirostra | Idaho | 52.45629 |
| red-breasted nuthatch | Sitta canadensis | Idaho | 89.29945 |
| red-tailed hawk | Buteo jamaicensis | Idaho | 83.58497 |
| red-winged blackbird | Agelaius phoeniceus | Idaho | 65.22796 |
| ring-necked duck | Aythya collaris | Idaho | 87.38486 |
| rock pigeon | Columba livia | Idaho | 51.20198 |
| sage thrasher | Oreoscoptes montanus | Idaho | 53.83167 |
| sharp-shinned hawk | Accipiter striatus | Idaho | 53.95897 |
| song sparrow | Melospiza melodia | Idaho | 58.15657 |
| spotted sandpiper | Actitis macularius | Idaho | 76.10001 |
| Swainson’s hawk | Buteo swainsoni | Idaho | 95.49056 |
| western grebe | Aechmophorus occidentalis | Idaho | 67.58516 |
| western kingbird | Tyrannus verticalis | Idaho | 53.46191 |
| western screech-owl | Megascops kennicottii | Idaho | 83.04902 |
| western tanager | Piranga ludoviciana | Idaho | 98.95006 |
| western wood-pewee | Contopus sordidulus | Idaho | 67.01473 |
| willow flycatcher | Empidonax traillii | Idaho | 57.60302 |
| yellow warbler | Setophaga petechia | Idaho | 67.49039 |
# create clean data folder and save new data file
dir.create("02_Clean_data", showWarnings = FALSE)
write.csv(ID_enc_rate, file = "02_Clean_data/ID_enc_rate.csv",
row.names = FALSE)
Although dynamically linking source code to markdown documents is the main goal of this chapter, sometimes it makes sense to include short code chunks directly into your markdown document. There are a variety of customizable options on how the code and its output are displayed in the final knitted document. We use code chunk arguments from the knitr package to choose the parts of our code chunks that are outputted in the final html document.
First, we will learn how to set up a code chunk. We use code chunk options/arguments from the knitr package to choose the parts of our code chunks that are outputted in the final html document.
Code chunk options/arguments:
include = FALSE hides code and results from the output document (html in our case), but code will still runeval = FALSE includes code from the document without running itecho = FALSE hides code but not results from the documentresults = 'hide' hides results but not code from the documentwarning, message, error = FALSE hides warnings, messages, and error messages from the documentfig.show = 'hide' hides figures specifically from the final output documentFor eval and echo, you can tell knitr which expressions in the chunk to include using numerical vectors. For example, eval = 1:2 will only include the 1st 2 expressions of code in the output document.
Take 5 minutes to discuss with the person sitting next to you. Imagine you have an R markdown document with chunks of analysis. You are now sending just the html document outputted from it to your PI for review. What chunk arguments might you set? What if you are publishing the final version of that html document? Now share with the class.
Code chunks are best used when the code is simple or short. To demonstrate analysis in a code chunk for this section, we are going to to do a little bit more data transformation to make a couple of objects of the Idaho encounter rate data grouped taxonomically for some descriptive statistics we will use later. By the end of this tutorial, we will create an analysis to compare the mean encounter rate of songbirds and waterfowl in the Idaho dataset.
# Pull passerines from ID encounter rates list
passerine_enc_rate <- ID_enc_rate %>%
filter(common.name %in% c("American robin", "bank swallow",
"black-billed magpie", "black-headed grosbeak", "Brewer's blackbird",
"Bullock's oriole", "Cassin's finch", "Cassin's vireo",
"cedar waxwing", "cliff swallow", "dark-eyed junco",
"dusky flycatcher", "European starling", "evening grosbeak",
"Hammond's flycatcher", "house finch", "lazuli bunting",
"MacGillivray's warbler", "mountain chickadee", "northern rough-winged swallow",
"olive-sided flycatcher", "pine siskin", "red crossbill",
"red-winged blackbird", "sage thrasher", "song sparrow",
"western kingbird", "western tanager", "western wood-pewee",
"willow flycatcher", "yellow warbler"))
waterfowl_enc_rate <- ID_enc_rate %>%
filter(common.name %in% c("American coot", "American wigeon",
"cinnamon teal", "common goldeneye", "common merganser",
"mallard", "ring-necked duck", "western grebe"))
Sometimes you may want to have R code or output appear inline with the rest of the markdown document. These can be static or dynamic.
ID_enc_rate <- state_enc_rate_clean_filtered %>% filter(encounter.state.normalized!=0 )Bogging down markdown documents with lots of long code chunks can impair readibility. This is where modular analysis comes in. Modular analysis is a way to dynamically included analysis in your .rmd by keeping script for an analysis in a separate file and then linking/calling it in the .rmd. Changing the script file will change the output in the .rmd, and that output can be used for further analysis in the .rmd. Other times where it might be better to link to a script that contains analysis code include: needing the same code for multiple different output documents, making it easier for other researchers to look for specific parts of your analysis. We will go over two ways to dynamically include analysis in your R Markdown document: locally sourced modular analysis and url-sourced modular analysis.
We can link to previously written code locally stored on your computer that you want to pull for an analysis. To do so, we will save that code in its own R file and use the source command from knitR to call it. This method is best utilized in the early stages of your research, as others won’t have access to your local files.
Here is a string of code that was used to produce a figure. This is also made from ebird data, but instead of starting with someone else’s caclulations, we are interacting with the package ebirdst, which pulls data Cornell’s eBird database.
# Note eval = FALSE. This code chunk won't actually run. We
# are going to use an r file with the same code for this
# exercise. load packages
library(ggplot2)
library(sf)
library(raster)
library(terra)
library(patchwork)
library(tigris)
library(rnaturalearth)
library(ebirdst)
# Allows to access data in ebirst package
set_ebirdst_access_key("aj6vlciqa1gd", overwrite = TRUE)
# Get the state boundary for Sonora, Mexico
region_boundary <- ne_states(iso_a2 = "MX", returnclass = "sf") |>
filter(name == "Sonora")
# download data if they haven't already been downloaded
# only weekly 3km relative abundance, median and confidence
# limits
ebirdst_download_status("ameavo", pattern = "abundance_(median|upper|lower)_3km")
# load the median weekly relative abundance and lower/upper
# confidence limits
abd_median <- load_raster("ameavo", product = "abundance", metric = "median")
abd_lower <- load_raster("ameavo", product = "abundance", metric = "lower")
abd_upper <- load_raster("ameavo", product = "abundance", metric = "upper")
# project region boundary to match raster data
region_boundary_proj <- st_transform(region_boundary, st_crs(abd_median))
# extract values within region and calculate the mean
abd_median_region <- extract(abd_median, region_boundary_proj,
fun = "mean", na.rm = TRUE, ID = FALSE)
abd_lower_region <- extract(abd_lower, region_boundary_proj,
fun = "mean", na.rm = TRUE, ID = FALSE)
abd_upper_region <- extract(abd_upper, region_boundary_proj,
fun = "mean", na.rm = TRUE, ID = FALSE)
# transform to data frame format with rows corresponding to
# weeks
chronology_SO <- data.frame(week = as.Date(names(abd_median)),
median = as.numeric(abd_median_region), lower = as.numeric(abd_lower_region),
upper = as.numeric(abd_upper_region))
# Make a graph
AMAV_SO <- ggplot(chronology_SO) + aes(x = week, y = median) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "green",
alpha = 0.4) + geom_line(color = "purple") + scale_x_date(date_labels = "%b",
date_breaks = "1 month") + theme_classic() + labs(x = "Week",
y = "Mean relative abundance in Sonora", title = "Migration chronology for American Avocet in Sonora")
AMAV_SO
We have put the code from the code chunk above into a separate r file in the Ch. 9 folder. Now we are going to call it with the source function. Try running the following code chunk:
# Run main analysis
source("03_linked_scripts/Sonora_AMAV_migration_chron.R")
## eBird Status and Trends access key stored in: C:/Users/rosey/.Renviron
## The rnaturalearthhires package needs to be installed.
## Installing the rnaturalearthhires package.
## WARNING: Rtools is required to build R packages, but is not currently installed.
##
## Please download and install Rtools 4.5 from https://cran.r-project.org/bin/windows/Rtools/.
##
## Using GitHub PAT from the git credential store.
##
## Downloading GitHub repo ropensci/rnaturalearthhires@HEAD
## ── R CMD build ─────────────────────────────────────────────────────────────────
## WARNING: Rtools is required to build R packages, but is not currently installed.
##
## Please download and install Rtools 4.5 from https://cran.r-project.org/bin/windows/Rtools/.
## checking for file 'C:\Users\rosey\AppData\Local\Temp\RtmpS2BsBb\remotesa96c2623458c\ropensci-rnaturalearthhires-e4736f6/DESCRIPTION' ... ✔ checking for file 'C:\Users\rosey\AppData\Local\Temp\RtmpS2BsBb\remotesa96c2623458c\ropensci-rnaturalearthhires-e4736f6/DESCRIPTION' (348ms)
## ─ preparing 'rnaturalearthhires': (427ms)
## checking DESCRIPTION meta-information ... ✔ checking DESCRIPTION meta-information
## ─ checking for LF line-endings in source and make files and shell scripts
## ─ checking for empty or unneeded directories
## ─ building 'rnaturalearthhires_1.0.0.9000.tar.gz'
##
##
## Installing package into 'C:/Users/rosey/AppData/Local/R/win-library/4.5'
## (as 'lib' is unspecified)
## Downloading Status Data Products for ameavo
## Data already exists, use force = TRUE to re-download.
AMAV_SO
Once you are further along in your research, you may want to make sure your file in publicly accessible to ensure reproducibility. You can host your file in a GitHub repository and use the source_url command from the devtools package to call it.
We have set up a pre-made GitHub file for this section. It contains the following code:
region_boundary <- ne_states(iso_a2 = "US") |>
filter(name == "Idaho")
# download data if they haven't already been downloaded
# only weekly 3km relative abundance, median and confidence
# limits
ebirdst_download_status("ameavo", pattern = "abundance_(median|upper|lower)_3km")
# load the median weekly relative abundance and lower/upper
# confidence limits
abd_median <- load_raster("ameavo", product = "abundance", metric = "median")
abd_lower <- load_raster("ameavo", product = "abundance", metric = "lower")
abd_upper <- load_raster("ameavo", product = "abundance", metric = "upper")
# project region boundary to match raster data
region_boundary_proj <- st_transform(region_boundary, st_crs(abd_median))
# extract values within region and calculate the mean
abd_median_region <- extract(abd_median, region_boundary_proj,
fun = "mean", na.rm = TRUE, ID = FALSE)
abd_lower_region <- extract(abd_lower, region_boundary_proj,
fun = "mean", na.rm = TRUE, ID = FALSE)
abd_upper_region <- extract(abd_upper, region_boundary_proj,
fun = "mean", na.rm = TRUE, ID = FALSE)
# transform to data frame format with rows corresponding to
# weeks
chronology_ID <- data.frame(week = as.Date(names(abd_median)),
median = as.numeric(abd_median_region), lower = as.numeric(abd_lower_region),
upper = as.numeric(abd_upper_region))
AMAV_ID <- ggplot(chronology_ID) + aes(x = week, y = median) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "green",
alpha = 0.4) + geom_line(color = "purple") + scale_x_date(date_labels = "%b",
date_breaks = "1 month") + theme_classic() + labs(x = "Week",
y = "Mean relative abundance in Washington", title = "Migration chronology for American Avocet in Idaho")
AMAV_ID
Use the source_url command from the devtools package to call the file. Run the following code chunk.
source_url("https://raw.githubusercontent.com/rosean829/Rep-Sci-Chapter-9/main/03_linked_scripts/Idaho_AMAV_migration_chronology.R")
## ℹ SHA-1 hash of file is "dd74275f1b0a1301afdd9da8cf3db53c2fca5761"
## Downloading Status Data Products for ameavo
##
## Data already exists, use force = TRUE to re-download.
AMAV_ID
When an analysis produces simulations, a random number generator is generally used so that others can replicate your results. For this tutorial, we will do an exercise to learn the importance of using the set.seed command to ensure replicability.
You and the person next to you use the same distribution variables but diff vs. same set.seed. Do you get the same result? What about when you use the same set.seed?
Have the person sitting on the left run the following simple regression with set.seed(125). The person on the right will run the next code chunk. Compare your results.
## The person on the left of each pair will set seed as 125
set.seed(125)
# Draw 1000 numbers
Draw1 <- rnorm(1000, mean = 0, sd = 2) # Summarize Draw1
# Draw 1000 other numbers
Draw2 <- rnorm(1000, mean = 5, sd = 1.5)
# Plot Draw1 and Draw2
plot(Draw2)
plot(Draw1)
# Create a regression with Draw1 and Draw2, then plot it.
model <- lm(Draw1 ~ Draw2)
plot(model)
## Save the 1st regression in its own Rdata file (this will
## be relevant in a later section)
save(model, file = "model.RData")
The person sitting on the right will run the following code chunk (same code as the chunk above but with set.seed(150)). After comparing your results with your partner, run the set.seed(125) chunk. Compare your results.
## The person on the right of each pair will set seed as
## 150 Set seed as 150
set.seed(150)
# Draw 1000 numbers
Draw3 <- rnorm(1000, mean = 0, sd = 2) # Summarize Draw1
# Draw 1000 other numbers
Draw4 <- rnorm(1000, mean = 5, sd = 1.5)
# Plot both Draw3 and Draw4
plot(Draw3)
plot(Draw4)
# Create a regression from Draw3 and Draw4, then plot it.
model1 <- lm(Draw3 ~ Draw4)
plot(model1)
When an analysis is too computationally intensive, knowing how to use the cache functions from knitR is helpful. To cache a code chunk, use the argument cache = TRUE. A cached chunk will only run if the chunk’s contents or options change.
However, subsequent chunks will not be able to access objects created by the cached chunks or any packages loaded in it (all packages should be loaded in their own chunk anyways though), and the cached chunk will not run if a previous chunk with code it depended on changes. You can solve this last problem with the argument dependson. A chunk with the dependson argument will rerun if the specified chunk is also rerun. To specify a chunk, use a vector of their labels or their number order from the start of the document. For example, dependson = c(2,3) will rerun the cached chunk if the 2nd and 3rd chunks of the document are also rerun.
The 2 code chunks from earlier have the argument cache = TRUE. If you haven’t knit the document yet, knit it now. If you have, do you remember the load time for the knit? Now that you’ve knit once, knit the document again. What happens to the knit time?
You can also save objects created by a cached chunk to a separate .RData file and load that in subsequent chunks with the load command if you want to keep using the outputs of the cached chunk even when the chunk doesn’t run. The last line of code in one of the previous chunks (save(model, file = "model.RData")) saved its output in an .RData file. In the next chunk, we will call and then plot the output without running that chunk:
# Load Sample
load(file = "model.RData")
summary(model)
##
## Call:
## lm(formula = Draw1 ~ Draw2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.0946 -1.2920 0.0146 1.4463 5.8039
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.22228 0.22736 -0.978 0.328
## Draw2 0.01989 0.04307 0.462 0.644
##
## Residual standard error: 2.016 on 998 degrees of freedom
## Multiple R-squared: 0.0002136, Adjusted R-squared: -0.0007882
## F-statistic: 0.2132 on 1 and 998 DF, p-value: 0.6444
plot(model)
Citations of all R packages used to generate this report. Reads and prints citations stored in packages.bib
### Load R package
library("knitcitations")
### Process and print citations in packages.bib Clear all
### bibliography that could be in the cash
cleanbib()
# Set pandoc as the default output option for bib
options(citation_format = "pandoc")
# Read and print bib from file
read.bibtex(file = "packages.bib")
[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.30. 2025. https://github.com/rstudio/rmarkdown.
[2] S. M. Bache and H. Wickham. magrittr: A Forward-Pipe Operator for R. R package version 2.0.4. 2025. https://magrittr.tidyverse.org.
[3] R. S. Bivand, E. Pebesma, and V. Gomez-Rubio. Applied spatial data analysis with R, Second edition. Springer, NY, 2013. https://asdar-book.org/.
[4] C. Boettiger. knitcitations: Citations for Knitr Markdown Files. R package version 1.0.12. 2021. https://github.com/cboettig/knitcitations.
[5] J. Cheng, C. Sievert, B. Schloerke, et al. htmltools: Tools for HTML. R package version 0.5.8.1. 2024. https://github.com/rstudio/htmltools.
[6] R. Francois and D. Hernangómez. bibtex: Bibtex Parser. R package version 0.5.1. 2023. https://github.com/ropensci/bibtex.
[7] G. Grolemund and H. Wickham. “Dates and Times Made Easy with lubridate”. In: Journal of Statistical Software 40.3 (2011), pp. 1-25. https://www.jstatsoft.org/v40/i03/.
[8] R. J. Hijmans. raster: Geographic Data Analysis and Modeling. R package version 3.6-32. 2025. https://rspatial.org/raster.
[9] R. J. Hijmans. terra: Spatial Data Analysis. R package version 1.8-70. 2025. https://rspatial.org/.
[10] P. Massicotte and A. South. rnaturalearth: World Map Data from Natural Earth. R package version 1.1.0. 2025. https://docs.ropensci.org/rnaturalearth/.
[11] K. Müller and H. Wickham. tibble: Simple Data Frames. R package version 3.3.0. 2025. https://tibble.tidyverse.org/.
[12] E. Pebesma. “Simple Features for R: Standardized Support for Spatial Vector Data”. In: The R Journal 10.1 (2018), pp. 439-446. DOI: 10.32614/RJ-2018-009. https://doi.org/10.32614/RJ-2018-009.
[13] E. Pebesma. sf: Simple Features for R. R package version 1.0-21. 2025. https://r-spatial.github.io/sf/.
[14] E. J. Pebesma and R. Bivand. “Classes and methods for spatial data in R”. In: R News 5.2 (Nov. 2005), pp. 9-13. https://CRAN.R-project.org/doc/Rnews/.
[15] E. Pebesma and R. Bivand. Spatial Data Science: With applications in R. Chapman and Hall/CRC, 2023. DOI: 10.1201/9780429459016. https://r-spatial.org/book/.
[16] E. Pebesma and R. Bivand. sp: Classes and Methods for Spatial Data. R package version 2.2-0. 2025. https://github.com/edzer/sp/.
[17] T. L. Pedersen. patchwork: The Composer of Plots. R package version 1.3.2. 2025. https://patchwork.data-imaginist.com.
[18] Y. Qiu. prettydoc: Creating Pretty Documents from R Markdown. R package version 0.4.1. 2021. https://github.com/yixuan/prettydoc.
[19] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2025. https://www.R-project.org/.
[20] K. Ren and K. Russell. formattable: Create Formattable Data Structures. R package version 0.2.1. 2021. https://renkun-ken.github.io/formattable/.
[21] V. Spinu, G. Grolemund, and H. Wickham. lubridate: Make Dealing with Dates a Little Easier. R package version 1.9.4. 2024. https://lubridate.tidyverse.org.
[22] M. Strimas-Mackey, S. Ligocki, T. Auer, et al. ebirdst: Access and Analyze eBird Status and Trends Data Products. R package version 3.2023.1. 2025. https://ebird.github.io/ebirdst/.
[23] K. Walker. tigris: Load Census TIGER/Line Shapefiles. R package version 2.2.1. 2025. https://github.com/walkerke/tigris.
[24] H. Wickham. forcats: Tools for Working with Categorical Variables (Factors). R package version 1.0.1. 2025. https://forcats.tidyverse.org/.
[25] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. https://ggplot2.tidyverse.org.
[26] H. Wickham. stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.5.2. 2025. https://stringr.tidyverse.org.
[27] H. Wickham. tidyverse: Easily Install and Load the Tidyverse. R package version 2.0.0. 2023. https://tidyverse.tidyverse.org.
[28] H. Wickham, M. Averick, J. Bryan, et al. “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43 (2019), p. 1686. DOI: 10.21105/joss.01686.
[29] H. Wickham, J. Bryan, M. Barrett, et al. usethis: Automate Package and Project Setup. R package version 3.2.1. 2025. https://usethis.r-lib.org.
[30] H. Wickham, W. Chang, L. Henry, et al. ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 4.0.0. 2025. https://ggplot2.tidyverse.org.
[31] H. Wickham, R. François, L. Henry, et al. dplyr: A Grammar of Data Manipulation. R package version 1.1.4. 2023. https://dplyr.tidyverse.org.
[32] H. Wickham and L. Henry. purrr: Functional Programming Tools. R package version 1.1.0. 2025. https://purrr.tidyverse.org/.
[33] H. Wickham, J. Hester, and J. Bryan. readr: Read Rectangular Text Data. R package version 2.1.5. 2024. https://readr.tidyverse.org.
[34] H. Wickham, J. Hester, W. Chang, et al. devtools: Tools to Make Developing R Packages Easier. R package version 2.4.6. 2025. https://devtools.r-lib.org/.
[35] H. Wickham, D. Vaughan, and M. Girlich. tidyr: Tidy Messy Data. R package version 1.3.1. 2024. https://tidyr.tidyverse.org.
[36] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman and Hall/CRC, 2016. ISBN: 978-1138700109. https://bookdown.org/yihui/bookdown.
[37] Y. Xie. bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.45. 2025. https://github.com/rstudio/bookdown.
[38] Y. Xie. Dynamic Documents with R and knitr. 2nd. ISBN 978-1498716963. Boca Raton, Florida: Chapman and Hall/CRC, 2015. https://yihui.org/knitr/.
[39] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014.
[40] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.50. 2025. https://yihui.org/knitr/.
[41] Y. Xie, J. Allaire, and G. Grolemund. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman and Hall/CRC, 2018. ISBN: 9781138359338. https://bookdown.org/yihui/rmarkdown.
[42] Y. Xie, C. Dervieux, and E. Riederer. R Markdown Cookbook. Boca Raton, Florida: Chapman and Hall/CRC, 2020. ISBN: 9780367563837. https://bookdown.org/yihui/rmarkdown-cookbook.
[43] H. Zhu. kableExtra: Construct Complex Table with kable and Pipe Syntax. R package version 1.4.0. 2024. http://haozhu233.github.io/kableExtra/.
Version information about R, the operating system (OS) and attached or R loaded packages. This appendix was generated using sessionInfo().
# Load and provide all packages and versions
sessionInfo()
## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/Denver
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rnaturalearthhires_1.0.0.9000 rnaturalearth_1.1.0
## [3] tigris_2.2.1 patchwork_1.3.2
## [5] terra_1.8-70 raster_3.6-32
## [7] sp_2.2-0 sf_1.0-21
## [9] ebirdst_3.2023.1 lubridate_1.9.4
## [11] forcats_1.0.1 stringr_1.5.2
## [13] purrr_1.1.0 readr_2.1.5
## [15] tidyr_1.3.1 tibble_3.3.0
## [17] ggplot2_4.0.0 tidyverse_2.0.0
## [19] devtools_2.4.6 usethis_3.2.1
## [21] bibtex_0.5.1 knitcitations_1.0.12
## [23] htmltools_0.5.8.1 prettydoc_0.4.1
## [25] magrittr_2.0.4 dplyr_1.1.4
## [27] kableExtra_1.4.0 formattable_0.2.1
## [29] bookdown_0.45 rmarkdown_2.30
## [31] knitr_1.50
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 viridisLite_0.4.2 farver_2.1.2 S7_0.2.0
## [5] fastmap_1.2.0 digest_0.6.37 timechange_0.3.0 lifecycle_1.0.4
## [9] ellipsis_0.3.2 compiler_4.5.1 rlang_1.1.6 sass_0.4.10
## [13] tools_4.5.1 yaml_2.3.10 htmlwidgets_1.6.4 pkgbuild_1.4.8
## [17] classInt_0.4-11 plyr_1.8.9 xml2_1.4.1 RColorBrewer_1.1-3
## [21] pkgload_1.4.1 KernSmooth_2.23-26 withr_3.0.2 grid_4.5.1
## [25] e1071_1.7-16 scales_1.4.0 cli_3.6.5 generics_0.1.4
## [29] remotes_2.5.0 rstudioapi_0.17.1 httr_1.4.7 tzdb_0.5.0
## [33] sessioninfo_1.2.3 DBI_1.2.3 cachem_1.1.0 proxy_0.4-27
## [37] formatR_1.14 vctrs_0.6.5 jsonlite_2.0.0 hms_1.1.4
## [41] systemfonts_1.3.1 jquerylib_0.1.4 units_1.0-0 glue_1.8.0
## [45] RefManageR_1.4.0 codetools_0.2-20 stringi_1.8.7 gtable_0.3.6
## [49] pillar_1.11.1 rappdirs_0.3.3 R6_2.6.1 textshaping_1.0.4
## [53] evaluate_1.0.5 lattice_0.22-7 backports_1.5.0 memoise_2.0.1
## [57] bslib_0.9.0 class_7.3-23 Rcpp_1.1.0 uuid_1.2-1
## [61] svglite_2.2.2 xfun_0.53 fs_1.6.6 pkgconfig_2.0.3